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Abstract 

This article considers recovery of signals that are sparse or approximately sparse in terms of 
a (possibly) highly overcomplete and coherent tight frame from undersampled data corrupted 
with additive noise. We show that the properly constrained ^-analysis, called analysis Dantzig 
selector, stably recovers a signal which is nearly sparse in terms of a tight frame provided that 
the measurement matrix satisfies a restricted isometry property adapted to the tight frame. As 
a special case, we consider the Gaussian noise. Further, under a sparsity scenario, with high 
probability, the recovery error from noisy data is within a log-like factor of the minimax risk 
over the class of vectors which is at most s sparse in terms of the tight frame. Similar results 
for the analysis LASSO are showed. 

The above two algorithms provide guarantees only for noise that is bounded or bounded with 
high probability (for example, Gaussian noise). However, when the underlying measurements 
are corrupted by sparse noise, these algorithms perform suboptimally. We demonstrate robust 
methods for reconstructing signals that are nearly sparse in terms of a tight frame in the presence 
of bounded noise combined with sparse noise. The analysis in this paper is based on the 
restricted isometry property adapted to a tight frame, which is a natural extension to the 
standard restricted isometry property. 

Keywords. Zi-analysis, Restricted isometry property, Sparse recovery, Dantzig selector, LASSO, 
Gaussian noise, Sparse noise. 



1 Introduction 

1.1 Standard compressed sensing 

Compressed sensing predicts that sparse signals can be reconstructed from what was previously be- 
lieved to be incomplete information. The seminal papers [121 EH] have triggered a large research 
activity in mathematics, engineering and computer science with a lot of potential applications. 
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Formally, in compressed sensing, one considers the following model: 

y = Af + z, (1.1) 

where A is a known mxn measurement matrix (with m <C n) and z E R m is a vector of measurement 
errors. The goal is to reconstruct the unknown signal / based on y and A. The key idea is that 
the sparsity helps in isolating the original signal under suitable conditions on A. 

The approach for solving this problem probably comes first to mind is to search for the sparsest 
vector in the feasible set of possible solutions, which leads to Zo-minimization problem. However, 
solving Zo-rciinimization directly is NP-hard in general and thus is computationally infeasible [42|, B3] . 
It is then natural to consider the method of Zi-minimization which can be viewed as a convex 
relaxation of ZQ-minimization. Three most renown recovery algorithms based on convex relaxation 
proposed in the literature are: the Basis Pursuit (BP) [7J, the Dantzig selector (DS) [E], and the 
LASSO estimator [52J (or Basis Pursuit Denoising [7]): 

(BP) : min \\f\\t subject to \\Af - y\\ 2 < e, 

/eR n 

(DS): minll/llx subject to \\A* (Af - y)^ < X n a, 

/eK n 

(LASSO) : min \\\(Af - y)f 2 
/sir 1 

here || • ||2 denotes the standard Euclidean norm, || • ||i is the /i-norm, X n (or fj, n ) is a turning 
parameter, and e (or a) is a measure of the noise level. All these three optimization programs can 
be implemented efficiently using convex programming or even linear programming. 

It is now well known that the BP recovers all (approximately) s sparse vectors with small 
or zero errors provided that the measurement matrix A satisfies a restricted isometry property 
(RIP) condition 5 CS < 5 for some constants c, 5 > and that the error bound | j ^ [ 1 2 is small 
[13 EH El HQ [281 SO]- Similar results were obtained for the DS and the LASSO provided that A 
satisfies the RIP condition 5 CS < 5 for some constants c, 5 > and that the error bound ll^zlloo is 
small [151 El US]. Recall that for an m x n matrix A and s < n, the RIP constant 5 S [111 [T4l [20] is 
defined as the smallest number 5 such that for all s sparse vectors x E M n , 

(i-*)||x|ll<||^|ll<(i + <y)l|£|ll- 

So far, all good constructions of matrices with the RIP use randomness. It is well known |14[ 
El HT1 H9] that many types of random measurement matrices such as Gaussian matrices or Sub- 
Gaussian matrices have the RIP constant 5 S < 5 with overwhelming probability provided that 
m > C5~ 2 s log(n/s). Up to the constant, the lower bounds for Gelfand widths of Zi-balls |30[ 129] 
show that this dependence on n and s is optimal. The fast multiply partial random Fourier matrix 
has the RIP constant 5 S < 5 with very high probability provided that m > C5~ 2 s(\ogn) A [14, 491 133] . 

In many common settings it is natural to assume that the noise vector z ~ N(0,a 2 I), i.e., z is 
i.i.d. Gaussian noise, which is of particular interest in signal processing and in statistics. The case 
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of Gaussian noise was first considered in [32J, which examined the performance of lo-minhrhzation 
with noisy measurements. Since the Gaussian noise is essentially bounded (e.g. [13 [IT]), all stably 
recovery results mentioned above for bounded error related to the BP, the DS and the LASSO can 
be extended directly to the Gaussian noise case. While the BP and the DS (or the Lasso) provide 
very similar guarantees, there are certain circumstances where the DS is preferable since the DS 
yields a bound that is adaptive to the unknown level of sparsity of the object we try to recover and 
thus providing a stronger guarantee when s is small [15]. Besides, Candes and Tao [15] established 
an oracle inequality for the DS. Bickel et al. [5] showed that the DS and the LASSO have analogous 
properties, which lead to analogous error bounds. 

The above mentioned recovery algorithms provide guarantees only for noise that is bounded 
or bounded with high probability. However, these algorithms perform suboptimally when the 
measurement noise is also sparse [36] . This can occur in practice due to shot noise, malfunctioning 
hardware, transmission errors, or narrowband interference. Several recovery techniques have been 
developed for sparse noise [Ml ED E5] • We refer the readers to [Ml ED E5] and the reference therein 
for more details on sparse noise. 

There are many other algorithmic approaches to compressed sensing based on pursuit algorithms 
in the literature, including Orthogonal Matching Pursuit (OMP) [37J [23], Stagewise OMP [23] , 
Regularized OMP [36], Compressive Sampling Matching Pursuit |45j . Iterative Hard Thresholding 
[2j, Subspace Pursuit [22] and many other variants. Refer to [54 j for an overview of these pursuit 
methods. 

1.2 /x-synthesis 

For signals which are sparse in the standard coordinate basis or sparse in terms of some other 
orthonormal basis, the techniques above hold. However, in practical examples, there are numerous 
signals of interest which are not sparse in an orthonormal basis. More often than not, sparsity is 
expressed not in terms of an orthogonal basis but in terms of an overcomplete dictionary, which 
means that our signal / E W 1 is now expressed as / = Dx where D E M nxd (d > n) is a redundant 
dictionary and x is (approximately) sparse, see e.g. [TJ HJ [8] and the reference therein. Examples 
include signal modeling in array signal processing (oversampled array steering matrix), reflected 
radar and sonar signals (Gabor frames), and images with curves (Curvelet frames), etc. 

The Zi-synthesis (e.g. [71 SHI [25] ) consists in finding the sparsest possible coefficient x by solving 
a Zi-minimization problem (BP or LASSO) with the decoding matrix AD instead of A, and then 
reconstruct the signal by a synthesis operation, i.e., / = Dx. Empirical studies show that l\- 
synthesis often provides good recovery [3 [25]. But there is litter about the theoretical performance 
of this method. In [38] recovery results were obtained where they essentially require the frame D 
has columns that are extremely uncorrelated such that AD satisfies the RIP condition imposed 
by the standard compressed sensing assumptions. However, if D is a coherent frame, AD does 
not generally satisfy the standard RIP [48, 8j. Meantime, the mutual incoherence property (MIP) 
|21j may not apply either, as it is very hard for AD to satisfy the MIP as well when D is highly 
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correlated. 



1.3 /x-analysis 

An alternative to /i-synthesis is Zi-analysis, which finds the estimator / directly by solving a l\- 
minimization problem. There are two most renown analysis recovery algorithm proposed in the 
literature: the analysis Basis Pursuit (ABP) [8j and the analysis LASSO (ALASSO) [251 153^1 : 

(ABP) : / = aigmin||D*/||i subject to \\Af - y|| 2 < e, (1.2) 

/6M" 

(ALASSO) : f AL = a I gmmh(Af-y)g + »\\D*f\\ 1 . (1-3) 

Here \i is a tuning parameter, and e is a measure of the noise level. 

Several works exist in the literature that are related to the analysis model (e.g. [251 EH El 
[H [38l S3]). It has been shown that li-analysis and ^-synthesis approaches are exactly equivalent 
when D is orthogonal otherwise there is markedly different between the two despite their apparent 
similarity |25j . for example truly redundant dictionary. Empirical evidence of the effectiveness of 
the analysis approach can be found in [25] for signal denoising and in [50] for signal and image 
restoration. Numerical algorithms have been proposed to solve the ALASSO, e.g. [311 |9[ 39j. 

More recently, Candes et al. [8j showed that the ABP recovers a signal / with an error bound 

||^)* £ ( D* f\ II 

11/ - /lb < Co V + C ^ (1-4) 

provided that A satisfies a restricted isometry property adapted to D (D-RIP) condition with 
82s < 0.08, where D is a tight frame for R n . Later, the D-RIP condition is improved to 82s < 0.493 
|37j . Note that we denote xui to be the vector consisting of the s largest coefficients of x G M rf in 
magnitude, i.e. xr s i is the best s sparse approximation to the vector x. Following [8], Liu et al. 
[38j provided a theoretical study on the error when the ABP is used in the context of compressed 
sensing with general frames. Aldroubi et al. p] showed that the ABP is robust to measurement 
noise, and stable with respect to perturbations of the measurement matrix A and the general frames 
D. Recall that the D-RIP of a measurement matrix A, which first appeared in [8] and is a natural 
extension to the standard RIP, is defined as follows: 

Definition 1.1 (D-RIP). Let D be an n x d matrix. A measurement matrix A is said to obey the 
restricted isometry property adapted to D (abbreviated as D-RIP) of order s with constant 8 if 

(1 - 8)\\Dv\\ 2 2 < \\ADv\\ 2 2 < (1 + 5)\\Dvg (1.5) 

holds for all s sparse vectors v G M. d . The D-RIP constant 8 S is defined as the smallest number 8 
such that ( li.5j) holds for all s sparse vectors v G M. d . 

Note that we use the name ABP and ALASSO as the counterparts of BP and LASSO respectively. If D is 
specially the concatenation of a discrete derivative and a weighted identity, then it is the Fused LASSO introduced 
in [53]. 
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As noted in [8J, using a standard covering argument as in [3] (also [15]). one can prove that, for 
any m x n matrix A obeying for any fixed v E R n , 

P (lll^lll - IMIll > <*IMIl) < ce"^ 2 , 5 6 (0, 1) (1.6) 

(7, c are positive numerical constants) will satisfy the D-RIP 5 S < 5 with overwhelming probability 
provided that m > C5~ 2 s log(d/s). Many types of random matrices satisfy (|1.6|) . It is now well 
known that matrices with Gaussian, Sub-Gaussian, or Bernoulli entries satisfy (|1.6p (e.g. |3j). It 
has also been showed [H] that if the rows of A are independent (scaled) copies of an isotropic <fo 
vector, then A also satisfies (ll.6p . Very recently, Ward and Kramer [34] showed that randomizing 
the column signs of any matrix that satisfies the standard RIP results in a matrix which satisfies the 
Johnson-Lindenstrauss lemma. Therefore, nearly all random matrix constructions which satisfy the 
standard RIP compressed sensing requirements will also satisfy the D-RIP. Consequently, partial 
random Fourier matrix (or partial circulant matrix) with randomized column signs will satisfy the 
D-RIP since these matrices are known to satisfy the RIP. 

1.4 Motivation and contributions 

In this paper, following [SJ, we consider recovery of signals which are (approximately) sparse in 
terms of a tight frame from undersampled data. Formally, let D be an n x d (n < d) matrix whose 
d columns D\, ...,-Dd form a tight frame for M n , i.e. 

f = Y J {f,D k )D k for all /eR", 

k 

where (•, •) denotes the standard Euclidean inner product. Our object in this paper is to reconstruct 
the unknown signal / G M n , where D*f is sparse or approximately sparse, from a collection of m 
linear measurements corrupted with additive noise (jl.ip . Motivated by the DS, we propose a 
reconstruction by the following algorithms: 

(ADS): / ADS = argmin|| J D*/|| 1 subject to \\D*A*(Af - y)\\oo < A. (1.7) 

We call this convex program the analysis Dantzig selector (ADS). It can be implemented efficiently 
using convex programming. For the rest of this paper, D is an n x d tight frame and 5 S denotes 
the D-RIP constant with order s of the measurement matrix A without special mentioning. 

We first show that, the ADS recovers a signal with an error bound 

||/ AD5 - /|| 2 < CoV~sX + C 1 I|D * / ~ ( ;T /)mI11 (1.8) 

provided that A satisfies the D-RIP with 63s < 1/2 and that IID*^*^!!^ < A, where Co and C\ 
are small positive constants depending only on the D-RIP constant 5^ s . As a special case, we 
consider the Gaussian noise z ~ N(0, a 2 1). Under a sparsity scenario in the case of Gaussian noise, 
comparing the error bound derived by the ABP in the literature, e.g [8j [38], the ADS yields a 
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bound that is adaptive to the unknown level of sparsity (with respect to D) of the object we try to 
recover and thus providing a stronger guarantee when s is small. Moveover, we derive a minimax 
over the class of vectors which is at most s sparse in terms of D, which tells us that such error 
bound (|1.8|) under a sparsity scenario is in general unimprovable if one ignor the log-like factor. 

To the best of our knowledge, there are fewer results on the performance of the ALASSO in the 
literature related to compressed sensing. Our second contribution of this paper is that as that for 
the ADS, we derive similar results for the ALASSO . 

The ADS, the ALASSO and the ABP provide guarantees only for noise that is bounded or 
bounded with high probability (for example, Gaussian noise). However, when the underlying mea- 
surements are corrupted by sparse noise [36J, such algorithms fail to recover a close approximation 
of the signal. Our third contribution of this paper is that we propose robust methods for recon- 
structing signals which are nearly sparse in terms of a tight frame in the presence of bounded noise 
combined with sparse (with respect to a tight frame) noise. Namely, we want to reconstruct the 
unknown signal / E K n , where D* f is sparse or approximately sparse, from a collection of m linear 
measurements 

y = Af + z + e, (1.9) 

where z is suitably bounded, e is s' sparse in terms of H and 6 K mxM (M > m) is a tight frame 
for M. m . Let $ = [A, I] and u = [f*,e*]*. Denote 



W 



D 

o n 



Then one have y = <£u + z and that W G ^{n+m)x{d+M) ig a tight frame for R n+m . We propose 
the following three approaches: the separation ABP (SABP), the separation ADS (SADS) and the 
separation ALASSO (S ALASSO): 

(SABP): u SABP = argmin subject to ||$u - y\\ 2 < s, (1.10) 

(SADS): u SADS = argmin || W*u\\i subject to ||W* - y)\\oo < A, (1.11) 

SgR n+m 

(SALASSO): u SAL = argmin r||(*fi - +/i||W*fi||i. (1.12) 

nSK n + m ^ 

We will provide results on performance of these approaches in the case when the measurement 
matrix A is a Gaussian matrix or Sub-Gaussian matrix. Our analysis is based on the W-RIP. 

We shall restrict this work to the setting of real valued signals / € M n . For perspective, it is 
known that compressed sensing results ([6]) such as for the BP are also valid for complex valued 
signals / G C d , e.g., [27]. 

1.5 Notation 

The following notation is used throughout this paper. The set of indices of the nonzero entries 
of a vector x is called the support of x and denoted as supp(x). Denote ||x||o = |supp(x)|. For 
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n G N, denote [n] to mean {1, 2, • • • , n}. Given an index set T C [n] and a matrix j4 S M mxn , T c is 
the complement of T in [n], At is the submatrix of A formed from the columns of A indexed by 
T. Write A* to mean the conjugate transpose of a matrix A, A T to mean (At)* , X m i n (A*A) and 
A m ax(^4*^4) to mean the smallest and largest eigenvalues of A* A, a m i n (A) and <J max (^4) to mean the 
smallest and largest singular values of A. \\A\\ is the operator norm of A. \\A\\ Pt q denotes the norm 
of A from l p to l q . For j £ [n], Aj is the jth columns of A. xt is the vector equal to x on T and 
zero elsewhere or a vector of x restricted to T. C > (or c, Co, C\) denotes a universal constant 
that might be different in each occurrence. 



1.6 Organization 

This paper is organized as follows. In Section 2, we present stably recovery results for the ADS. 
Similar results for the ALASSO are given in Section 3. Performance studies of the SABP, the SADS 
and the SALASSO are presented in Section 4. Section 5 contains the proofs of the main results. 



2 The analysis Dantzig selector 

In this section, we consider model (jl.ip . where z is suitably bounded. Specially, z can be Gaussian 
noise. We will present the performance of the ADS, which only requires that A satisfies the D-RIP. 

Theorem 2.1. Let D be an arbitrary nx d tight frame and let A be an mx n measurement matrix 
satisfying the D-RIP with 5^ s < \. Assume that A obeys \\D*A* 

^||oo — ^* THgti the solution, f 

to the ADS (T?p obeys 

||/ AZ?6 "-/||2 < Co^A + d 



ADS n . r- rr.\ , r< » D * f ~ ( D * f)[s]h 



where Cq and C\ are small constants depending only on the D-RIP constant 8% s . 

The Gaussian noise is essentially bounded. 

Lemma 2.2. Let D be an arbitrary n x d tight frame and let A be an m x n matrix satisfying 
the D-RIP with constant 8\ G (0, 1). Then for arbitrary fixed constant a > 0, the Gaussian error 
z ~ A r (0, a 2 I m ) satisfies 

1 



(\\D*A*z\\ O0 < a^2{l + a)(l + 5 1 ) log d\ > 1 



d a y/{\ + a)ir\ogd 



Combining Lemma [2.2l (q = 1) with Theorem 1 2 . 1 1 and noting that 8\ < 83s, we have the following 
result. 

Theorem 2.3. Let D be an arbitrary nx d tight frame and let A be an mx n measurement matrix 
satisfying the D-RIP with 8% s < \. Assume that z ~ A^(0,cr 2 / m ) and that f ADS is the solution of 
the ADS [TTfy with A = 2c>y21oga?. Then we have 

\\f ADS ~fh< Coe^sl^d + Ci II^/-(^/)mHi 
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with probability at least 1 — l/(dy/2ir log d), where Co and C\ are small constants depending only 
on S 3s . 

Remark 2.4. (a) In the exactly s sparse case (\\D*f\\o < s), the above theorem implies 

\\f ADS -f\\l<C,-\ogd-sa 2 . (2.1) 

Specially, when D = I, that is for the standard compressed sensing, we derive similar result as in 
\15\ Theorem 1.1] (see also |5l \lTj)- Now it was showed in [15] that the standard DS achieves a 
loss within a logarithmic factor of the ideal mean squared error. The log-like factor is the price we 
pay for adaptivity, that is, for not knowing ahead of time where the nonzero coefficients actually 
are. In this sense, ignoring the log-like factor, the error bound i2. 1\) is in general unimprovable. 

(b) The Gaussian error satisfies 

P(||z|| 2 < o\J m + 2^m\ogm) > 1 - -j-, (2.2) 

see \17\ Lemma 1]. Combing this with fli.^| ), one would show that the solution f to the ABP M.2\) 
with e = o~\J m + 2\Jm\ogm satisfies 

Wf-fh <c J D f)[s]h + C 3 aJm + 2y / mlogm (2.3) 

with high probability provided that A satisfies the D-RIP with 82s < 0.493, where C2 and C3 are 
small constants depending on 62s- Specially, if ||D*/||o < s, then 

11/ - /|| 2 < Cxa^jm + 2 v / ^loi^. (2.4) 

Ignoring the D-RIP condition, the precise constants and the probabilities with which the stated 
bounds hold, we observe that in the case when m = 0(s log d), \2.J$ and h2.1\) appear to be es- 
sentially the same. However, there is a subtle difference. Specially, if m and n are fixed and we 
consider the effect of varying s, we can see that the ADS yields a bound that is adaptive to this 
change, providing a stronger guarantee when s is small, whereas the bound in {2.1$ does not improve 
as s is reduced. What is missing in J21 \M$ is achieved here is the adaptivity to the unknown level 
of sparsity (with respect to D) of the object we try to recover. 

(c) Assume that the signal's transform coefficients in terms of D decays like a power-law, i.e, 
the jth largest entry of the vector \D */| obeys 

\D*f\ <R-r l 'v (2.5) 

for some positive numbers R and p < 1. Such a model is appropriate for the wavelet frame coeffi- 
cients of a piecewise smooth signal, for example. Then with high probability, we have 

^ADS_ f g< min Co-(a 2 klogd + R 2 k- 2 / p+1 ). 

Kk<s V / 
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(In this case, one can also compare this bound with the error estimates yielded by the ABP by 
applying \2. 5\) to \2. 3]) .) In the case of D = I, that is for the standard compressed sensing, we 
derive similar result as in ]lb\ Theorem 1.3]. 

(d) We have not tried to optimize the D-RIP condition. We expect that with a more complicated 
proof as in ffjj] or 128[ \16[ \26\, \40f , one can still improve this condition. 

The error bound (|2.ip is within a log-like factor of the minimax risk over the class of vectors 
which is at most s sparse in terms of D: 

Theorem 2.5. Let D be an arbitrary n x d tight frame. Assume that the measurement matrix A 
satisfies the D-RIP of order s and that z ~ N(0, a 2 I m ). Suppose that there exists a subset Tq g [d] 
such that | T 1 = s and S To C {D* f : / G W 1 }, where S To ={ieR d : supp(x) C T }. Then 

inf sup E\\f-f\\ 2 >-^-s-a 2 , 

f \\D*f\\ <s 1 + <> s 

where the infimum is over all measurable functions f(y) of y. 

Remark 2.6. When D is an identity matrix or an orthonormal basis, the condition Y,t C {D* f : 
/ G M. n } is satisfied. 

The exacting reading may argue that while this lower bound is in expectation, the upper bound 
holds with high probability. Thus, we provide the following complementary theorem. 

Theorem 2.7. Under the assumptions of Theorem \2.h\ any estimator f(y) obeys 

sup F(\\f-fg> 1 s-a 2 ) >l-e-a. 
\\D*f\\ <8 V 2(1 + d fl ) J 

3 The analysis LASSO 

In this section, we will present the performance of the ALASSO from the noisy measurements (jl.ip . 
where z is suitably bounded. Specially, z can be Gaussian noise. Note that our results are similar 
as that for the ADS. 

Theorem 3.1. Let D be an arbitrary nx d tight frame and let A be an mx n measurement matrix 
satisfying the D-RIP with d^s < \. Assume that /i obeys ||.D*j4*,<;|| 00 < /i/2. Then the solution f AL 
to the ALASSO £TJ$) obeys 

\\f AL ~fh< Cov^ + Cl llD * f - { °* f)[s]h , 

V s 

where C\ is small constant depending only on 8z s and Cq is depending on 5^ s and \\D*D\\i i. 
Combining Lemma 12.21 with Theorem 13.11 we have the following result. 
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Theorem 3.2. Let D be an arbitrary nx d tight frame and let A be an mx n measurement matrix 
satisfying the D-RIP with 5^ s < \. Assume that z ~ N(0,a 2 I m ) and that f AL is the solution of 
the ALASSO with u = log d. Then we have 



\\f AL -fh<c o- v / 7h^d + c 1 



!£>*/-(#*/)[*] Ill 



with probability exceeding 1 — l/(dy/2n log d), where C\ is small constant depending only on 5^ s and 
Co is depending on 83s and \\D*D\\n. 

Remark 3.3. (a) From the proof, one can see that C = 2v / 2(l + 2||D*D||i i i)/(l -45 3s ). When D 
is an identity matrix or an orthonormal basis, \\D*D\\n = 1. For general tight frame D, we hope 
that with some more delicate proof, the depending on \\D*D\\i t \ can be deleted. 

(b) In the exactly s sparse case (\\D* f\\$ < s), the above theorem implies 

\\f AL -f\\l<C Aogd-sa 2 . 

Specially, when D = I, that is for the standard compressed sensing, we derive similar result as in 
|3 Theorem 7.2]. 

4 Sparse noise 

In this section, we consider model (|1.9[) . where z is suitably bounded and e is sparse in terms of a 
tight frame £1. 

Theorem 4.1. Let D be an arbitrary nx d tight frame and let A be an mxn matrix with elements 
aij drawn i.i.d according to N(0, 1/m). Let ||0*e||o < s' , where Q G ^ mxM i s a tight frame forW 71 . 
Suppose m > C5~ 2 (s + s') log((d + M)/(s + s')) for some fixed 8 G (0, 1/4) and constant C . 

(a) Let A obeys \\W*<&*z\\ oa < A. Then with high probability, the solution u SADS to hi. 11^ obeys 

\\fSADS _ fh < ^SADS _ uh < Co VJT7A + Cl W D *f-(^f)ls] h_ 

Vs + s' 

(b) Let u obeys \\W*^* zW^ < fi/2. Then with high probability, the solution u SAL to h!.12\) 
obeys 

\\f SAL ~fh< \\u SAL -u\\ 2 < C 2 (l + 2|| J D* J D|| 1)1 )x^T7 M + c^lz^Ml. 

Vs + s' 

(c) Assume that \\z\\2 < £■ Then with high probability, the solution u SABP to S1.10\) obeys 

\\fSABP _ fh < ^SABP _ uh < + c Wf- VTf)[4\ 

V s + s' 

In the above, Co, • • • , C5 are small constants depending only on 5. 
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Remark 4.2. (a) From the proof of this theorem, one can see that such results can be extended to 
the more general class of Sub- Gaussian matrices and the case of that e is nearly sparse in terms of 

n. 

(b) In the case of z = and \\D*f\\o < s, the above theorem implies exact recovery (both f and 
e) via 

u = argmin [|W*ii[|i subject to &u = y. 

Specially, when ft = I, we derive similar result as in [36]. 

(c) By applying Lemma \2.2\ (Since from the proof of this theorem, one can see that A satisfies 
the W -RIP) and 112.2]) to the above theorem, one can get error estimates for the SABP, the SADS 
and the SALASSO in the case of z ~ N(0,a 2 I). 

5 Proofs 

We first recall some useful properties of a tight frame. Let D be an arbitrary n x d tight frame for 
R n , then 



= \\D*f\\j for a11 / G M ™> and \\ Dv h < 1Mb for all v £ R d . 
Refer the readers to (T8j Chapter 3] for details. 

5.1 Proof of Lemma 12.21 

Proof of Lemma \2.2\ Note that from the definition of D-RIP, we have 



y/l-SiWDjW* < \\ADj\\2 < Vl + tfiPilla < V^ + Si, Vj G [d]. (5.1) 

Without loss of generality, we assume that 1 1 Z) ^ 1 1 2 7^ for each j G [d\. Then by (|5.ip . we have 
\\ADj\\ 2 + 0. Let ojj — ct||ad.'||2 ' ^^en ^3 lias Gaussian distribution N(0, 1). By using the union 
bound and then the inequality (|5.1|) . we get 



^ \D*A*z\\ OQ > (Tv^l + aXl + tfOlogd 

d 

< (\ u j\\\ AD jh > V2(l + a)(l + 5i)logd 

d 

< J2 P (\° J j\ > V2(l + a)log d) 
j=l 

1 



d-P(|wi| > y / 2(l + a)logd) < 



d a y/(l + Q)vrlogd' 
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where the last step follows from the Gaussian tail probability bound that for a standard Gaussian 
variable V and any constant t, P(|V| > t) < 2t~ 1 -^=e~2 t . It thus follows that 



\D*A*z\\ OQ < (7^2(1 + a)(l + <Ji) log d 



> 1 



1 - P (\\D*A*z\\ 00 > a^2(l + a)(l + 5i)\ogd 
1 



d a y/{\ + a)irlogd 

□ 



5.2 Proof of Theorem ED 

Proof of Theorem \2.1\ The proof makes use of the ideas from [8j [T5| El [10] . Let / and f ADS be as 
in the theorem, and let Tq = T denote the set of the largest s coefficients of D* f in magnitude. Set 
h = f ADS — f and observe that by the triangle inequality 

\\D*A*Ah\\ 00 < \\D*A*(Af - y)^ + \\D*A*(Af ADS - y)^ < 2A. (5.2) 

Since f ADS is a minimizer, one gets that 

\\D*f\\t > PV^IIi- 

That is 

\\D* T f\\i + Ptc/||i > ||^/ AD5 ||i + ||^c/ AD5 ||i. 

Thus 

Pr/lli + Ptc/||i > \\D* T f\\i - \\D* T h\\ x + \\D* Tc hh - \\D* Ta f\\x. 

This implies 

Pt^IIi < 2 \\ D T*f\\l + Pt^IIi- ( 5 - 3 ) 

Next, we decompose the coordinates Tq into sets of size s in order of decreasing magnitude of 
D7p c h. Denote these sets T±,T2, and for simplicity of notation set Tqi = TqUTi. Note that for 
each j > 2, 

\\D* Tj h\\ 2 < 8 V 2 \\DZ..h\\ 00 < s-^WD^hh 

and thus 

J2\\D* Tj h\\ 2 < Yj^WDtM 1 = s~ 1/2 |I^T^||l- (5-4) 

j>2 j>l 

Set uoi = D* Tm h/\\DD* TQi h\\ 2 and Uj = D* T h/\\DD* T] h\\ 2 for each j > 2. Then ||Ztaoi|| 2 = 1 and 
ll^^jlb = 1 for each j > 2. We then obtain that 
(ADD* To h,ADD* T h) 



\DD* TQ h\\ 2 \\DD* T] h\\ 2 



{ADuj,ADu 01 } = - {\\ADuj + ADu 01 \\ 2 - \\ADuj - ADu i\\l} 



> ~ {(1 - 5 3s )\\D Uj + Duoig - (1 + S 3s )\\D Uj - DuoiHi} 



4 

{Duj,Du Ql ) - — - {\\Duj\\l + ||£)itQi||2} = {Duj,Du i) - 5 3s . 
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, " 2 



It thus follows that 

(Ah,ADD* Toi h) = (ADD* T(n h,ADD* Toi h) + ^(ADD^h, ADD^h) 

i>2 

> (1 - d 3s )\\DD* T(n h\\ 2 2 - S 3s \\DD* Toi h\\ 2 \\DD* Tj h\\ 2 + ^2 i DD Tj h, dd t 01 h) 

i>2 j>2 

By applying the equality 

J2(DD* T .h,DD* Toi h) = (h- DD* Toi h,DD* Toi h) = \\D* To h\\ 2 - \\DD* Toi h\\ 2 
i>2 

to the above inequality, we get 

(Ah,ADD* TQi h) > ll^^lli-^ll^^lll-feP^^lb^P^ 

i>2 

> (1 - 5 3s )\\D* Toi h\\l - 6 3s \\D* Toi h\\ 2 J] W Tj h\\ 2 . 

i>2 

Substituting the inequality f|5.4|) into the above inequality, we derive 

(Ah,ADD* Toi h) > (l-SaJllD^hlll-a-WStollD^hMlDZchlli. 

Besides, by using the holder inequality and ()5.2|) . we have 

(Ah, ADDt 01 h) = (D*A*Ah,D^ 01 h) < \\D* A* AhW^WD^hWi < 2XV2s\\D^ 01 h\\ 2 

Now combining the above two inequality and by a easy computation, we can derive 

lln * ,„ ^ 2\VZs + s- 1 / 2 5 3s \\D* Tc h\\i 
\\D Toi h h < ^ • 

It thus follows that 

llZ^lli < ^\\D* T h\\ 2 < VS\\D* Toi h\\ 2 < gvg^+j|fe% . 

-L "3s 

Substituting the above inequality to (|5.3|) and by a easy calculation, we can obtain 

, ln * ,,, ^ 2(l-5 3s )\\D* Tc f\\ 1 + 2V2Xs 
\\DtMi < • 

Now we are ready to give the error estimates. Note that 

\\h\\ 2 = \\D*h\\ 2 = \\D* Toi h\\ 2 +Y / \\D*T, 

i>2 

Introduce (|5.4[) and (|5.5p to the above, we get 

2Av / 2i + s- 1 / 2 ||Z^ c /i||i 



i » 2 ' 



< 



l-S 3s 

By applying (|5.6p . we derive 

4v^A 211^/Hi 



2 



1 - 2^ s (1 - 2J 3s )v^' 
which leads to the result. 
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5.3 Proof of Theorem 1231 

We first introduce the following well-known lemma, see for example |10t Lemma 3.11]. It gives the 
minimax risk for estimating the vector x £ R s from the data y S M. m and the linear model 

y = $X + Z, (5.7) 

where $ G M mxs and z ~ iV(0, cr 2 / m ). 



Lemma 5.1. Let <fr,x,y,z follow the linear model 1 5.1 ) and that Aj($*$) 6e i/ie eigenvalues of the 
matrix T/ien 



a 2 



inf sup Ellx - x\\l = a 2 trace(($*$) x ) = —7- — , 

where the infimum is over all measurable functions x(y) ofy. In particular, if one of the eigenvalues 
vanishes, then the minimax risk is unbounded. 

Proof of Theorem \2.5\ From the definition of .D-RIP, for all v £ M s , we have 

\\AD To vg < (1 + S s )\\D To vf 2 < (1 + 6 s )\\v\\l 

It thus follows that 

X max (D* To A*AD To ) < 1 + 5 S . (5.8) 

Let v £ K s , ADt , y, z follow the linear model y = ADj> v + z, where z ~ iV(0, a 2 I m ). By using 
Lemma |5. II and (|5.8p . we have 

inf sup E||« - = £ YTn* °a* ad ) - TTT S ' ct2 ' (5 ' 9) 

where the infimum is over all measurable functions v(y) of y. Note that we have 

inf sup E||/-/|| 2 > inf sup E||/-/|| 2 
/ ||-D*/[|o<s / £>*/es To 

= inf sup E||L>7-L»7|||>inf sup E\\D* T J - D* T J\\l 
f r>*/es To / D*/es To 

We have Dj, f(y) is measurable of y since f(y) is measurable. Then, with the assumption T<t C 
{D*f:feR n }, we get 

inf sup EII/-/H 2 >inf sup Ep^/ - Z^ /|| 2 > inf sup E\\v - v\\ 2 . 

f \\D*f\\ <s f D*fEZ TQ « vm s 

Substituting (|5.9p to the above inequality, we derive 

inf sup E\\f-f\\ 2 2 >-^-s-a 2 . 
f \\D*f\\o<s 1 + 

□ 
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5.4 Proof of Theorem 12771 

We begin by introduce the following lemma, see |10l Lemma 3.14]. 



Lemma 5.2. Suppose that x,y,&,z follow the linear model (5.1) with z ~ N(0, a I). Then 



inf sup P Mix — x\\ 2 > I, -.- m s • cr > 1 — e is . 
* xeR s V 2\\®V 



Proof of Theorem 2.1 . From the definition of tight frame, we have 



sup F(\\f-f\\l>—^-—s-a 2 
sup f(\\D*f-D*ff 2 > 1 s-a 2 
> sup w(\\D*f-D*f\\ 2 >—^—-s-a 2 



d */es To V 2 (! + ^) 

> sup F(\\D* T J-D* To f\\ 2 2 > 1 s-a 2 ) 
d*/sEt V 2(1 + 5 S ) J 

> sup f(||^ /-^ /|||> 1 2 s-aA 



D*/e£ To 

where we have used (|5.8p at the last step. Note that D^ o f(y) is measurable of y since /(y) is 
measurable. Then, with the assumption Sjj, C {D* f : f £ PJ 1 }, we get 



IP*/llo<s 
sup 



> sup ( /• • »r /l2>^.^ 2 



> inf sup P ( ||-0 - > ttttttt - no 5 • 0-2 ) 



> 1 — e is , 

where the last step follows from Lemma 15.21 □ 
5.5 Proof of Theorem I37H 

Proof of Theorem I3.il The proof is similar as that of Theorem 12.11 Set h = f — f. We will prove 
the following two inequality: 

. \\D*A*(Af AL - y)^ < ti\\D*D\\ ltl . 
• Pt^IIi ^ 3||I^fc||i + 4||D^/||i. 
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With these two inequalities and the assumptions of this theorem, a similar approach as that for 
Theorem 12.11 would lead to our results. 

For convenience, we denote C as the functional 

CCf) = \\\{Af-y)\\l + ^\\D*f\\ l , 

in which [i = 4a V 2 log d. The sub differential dJ- of a real valued convex lower semicontinuous 
function T : W 1 — > R is the multifunction defined by 

dHh) = {se iHV/ e M n , ^(/) > Hh) + (g, f - /o)} ■ 

Note that /o is a minimum of J- if and only if £ dJ-(fo). The subdifferential of C(f AL ) is 
dC{f AL ) = {A*{Af AL -y) + fiDv\v eR d : Vi = sgn(D*f AL ) if D*/ Ai / and |^| < lotherwise} . 

Hence there exists v 6 M. d such that ||t>||oo < 1 satisfying 

A*(Af AL -y) + fiDv = 0. 

Now we get 

\\D*A*(Af AL - yJHoc = mII^^IU < Ml|^*£>l|oc,oo = Hl^lki- 
Since is the minimizer to (jl.3p . we have 

- y||| + Mp*/^!!! < \\Uf - y)\\l + HID*/ Hi- 
Plug in y = Af + z and rearrange terms to give 

\\\Ahf 2 + /x||Zr/ AL ||i < (Ah,z)+ f i\\D*f\\ 1 . 

Note that from the definition of tight frame, and then by using the holder inequality and the 
assumption ||D* .A^Hoo < /u/2, we have 

(AM+^pVlli = {D*h,D*A*z) + n\\D*f\\ 1 < \\D*h\\ 1 \\D*A*z\\ 00 + f i\\D*f\\ 1 

< ^p^Hi+MllDVIIi. 

It thus follows that 

H|£*/ Ai ||i < ^11^111 + /ip*/ AL ||i</x/2p^||i + /ip*/l|i- 

This gives 

H^/^lli < ||M||i/2 + ||£>*/Hi- 
16 



Now a similar argument as that for (|5,3p leads to 

||-Dt^I|i < 3||D^||i + 4||D^ c /||i. (5.10) 

Now we sketch the important steps of the proof. Similar to (|5.2p . we have 

\\D*A*Ah\\ 00 < \\D*A*(Af - ^Hoo + \\D*A*(Af AL - y)^ < cofi, 

where cq = 1/2 + 1 1 Z?* 1 1 1,1 - With the above inequality, a similar argument as that for (|5.5p gives 

II n* Ml <r c^V^s + s- l l 2 5^\\D* Tc h\\i (t .„, 
\\U T h\\ 2 S 7 • lo. 11) 

1 - C>3s 



It thus follows that 

WDtHi < Vs~\\D^h\\ 2 < Js~\\D* T h\\ 2 < 



Toi ' t|12 - 1-6. 



33s 

Substituting the above inequality to (|5.10p and by a easy calculation, we can obtain 

II n * .11 <r 4(l-^)||^c/||i+3V2c 0/ x S 

D T cft 1 < ■ (5.12 

1 - 4d 3s 

Using (|5.1ip . (|5.4p and then applying (|5.1ip . we get 

lli.ll lln*MI 11 n* All .\^nn*i,ll ^ QoMyja + s~ 1/2 ||.P£ c /t||i . 4V2ic /i 4||Xfr e /||i 

f^2 l-e>3s l-4d 3s (l-4d4 s )Vs 

which leads to the result. □ 
5.6 Proof of Theorem 14.11 

We introduce the following result, see [361 Lemma 1]. As showed in [36], such results can be 
extended with different constants to the more general class of Sub-Gaussian matrices. 

Lemma 5.3. Let A be an m x n matrix with elements aij drawn i.i.d according to N(0, 1/m) and 
let $ = [A, I}. Then for every v G W m+n , 

p (Infill - IMI2I > 2< 5|Mll) < 3e~ m<52/8 , (5 €(0,1). (5.13) 

Proof of Theorem \4-l\ Under the assumptions in the theorem statement, by Lemma 15.31 we have 
that for every v £ M m+n , (|5.13p holds. Using a standard covering argument as in [3] (also |48|). 
one can prove that with probability exceeding 1 — 3e~ c ' 2m , satisfies the VF-RIP of order s + s' 
with constant 5. Then, the conclusions follow from Theorem 12.11 Theorem 13.11 (II. 4p and that 

\\W*u - (W*u) [a+a /]l|i < \\D*f - (D*/)mIIi + ||«*e - (0*e) M ||!. 

□ 
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